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SPARSE RANDOM GRAPHS: REGULARIZATION AND 
CONCENTRATION OF THE LAPLACIAN 


CAN M. LE, ELIZAVETA LEVINA, AND ROMAN VERSHYNIN 

Abstract. We study random graphs with possibly different edge prob¬ 
abilities in the challenging sparse regime of bounded expected degrees. 
Unlike in the dense case, neither the graph adjacency matrix nor its 
Laplacian concentrate around their expectations due to the highly ir¬ 
regular distribution of node degrees. It has been empirically observed 
that simply adding a constant of order 1/n to each entry of the adja¬ 
cency matrix substantially improves the behavior of Laplacian. Here 
we prove that this regularization indeed forces Laplacian to concentrate 
even in sparse graphs. As an immediate consequence in network analy¬ 
sis, we establish the validity of one of the simplest and fastest approaches 
to community detection - regularized spectral clustering, under the sto¬ 
chastic block model. Our proof of concentration of regularized Laplacian 
is based on Grothendieck’s inequality and factorization, combined with 
paving arguments. 
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1. Introduction 

Concentration properties of random graphs have received a substantial 
attention in the probability literature. In statistics, applications of these 
results to network analysis have been a particular focus of recent attention. 
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discussed in more detail in Section 3. For dense graphs (with expected 
node degrees growing with the number of nodes n), a number of results are 
available [37, 40, 14, 39, 30], mostly when the average expected node degree 
grows faster than logn. Real networks, however, are frequently very sparse, 
and no concentration results are available in the regime where the degrees 
are bounded by a constant. This paper makes one of the first contributions 
to the study of concentration in this challenging sparse regime. 

1.1. Do random graphs concentrate? Geometry of graphs is reflected 
in matrices canonically associated to them, most importantly the adjacency 
and Laplacian matrices. Concentration of random graphs can be understood 
as concentration of these canonical random matrices around their means. 

To recall the notion of graph Laplacian, let A be the n x n adjacency 
matrix of an undirected finite graph G on the vertex set V , |F| = n, with 
Aij = 1 if there is an edge between vertices i and j, and 0 otherwise. The 
(symmetric, normalized) Laplacian is defined as ^ 

C{A) = - A)D-^/‘^ = 1- (1.1) 

Here I is the identity matrix, and D = diag((ii) is the diagonal matrix with 
degrees d* = Ylj^v diagonal. Graph Laplacians can be thought 

of as discrete versions of the Laplace-Beltrami operators on Riemannian 
manifolds; see [16]. 

The eigenvalues and eigenvectors of the Laplacian matrix C{A) reflect 
some fundamental geometric properties of the graph G. The spectrum of 
£(j 4), which is often called the graph spectrum, is a subset of the interval 
[0,2]. The smallest eigenvalue is always zero. The spectral gap of G, which 
is usually defined as the minimum of the second smallest eigenvalue and the 
gap between 2 and the largest eigenvalue, provides a quantitative measure 
of connectivity of G; see [16]. 

In this paper we will study Laplacians of random graphs. A classical 
and well studied model of random graphs is the Erdos-Renyi model G{n,p), 
where an undirected graph G on n vertices is constructed by connecting 
each pair of vertices independently with a fixed probability p. Although the 
main result of this paper is neither known nor trivial for G{n,p), we shall 
work with a more general, inhomogeneous Erdds-Renyi model G{n, [pij)) in 
which edges are still generated independently, but with different probabilities 
Pij', see e.g. [10]. This includes many popular network models as special 
cases, including the stochastic block model [26]. We ask the following basic 
question. 

Question 1.1. When does the Laplacian of a random graph concentrate 
near a deterministic matrix? 

^ We first define the Laplacian of the subgraph induced by non-isolated nodes using 
(1.1) and then extend it for the whole graph by setting the new row and column entries 
of isolated nodes to zero. However, we will only work with restrictions of the Laplacian 
on nodes of positive degrees. 
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More precisely, for a random graph drawn from the inhomogeneous Erdos- 
Renyi model G{n, {pij)), we are asking whether, with high probability, 

\\£{A)-£{A)\\^\\CiA)\\ for A = EA = {p,,). 

Here || • || is the operator norm and £{A) is the Laplacian of the weighted 
graph with adjacency matrix A (obtained by simply replacing A with A in 
the definition of the Laplacian). Since the proper scaling for ||£(H)|| is H(l) 
(except for a trivial graph with no edges), we can equivalently restate the 
question as whether 

||£(H)-£(H)||«1. 

The answer to this question is crucial in network analysis; see Section 3. 

1.2. Dense graphs concentrate, sparse graphs do not. Concentration 
of relatively dense random graphs - those whose with expected degrees grow 
at least as fast as log n - is well understood. Both the adjacency matrix and 
the Laplacian concentrate in this regime. Indeed, Oliveira [37] showed that 
the inhomogeneous Erdos-Renyi model satisfies 

||£(A)-£(A)||=0(yf|^) (1.2) 

with high probability, where do = minjgv' Yhj^v denotes the smallest 
expected degree of the graph. The concentration inequality (1.2) is non¬ 
trivial when its right-hand side is o(l), i.e., do logn. 

Results like (1.2) for the Laplacian can be deduced from concentration in¬ 
equalities for the adjacency matrix A, combined with (simple) concentration 
inequalities for the degrees of vertices. Concentration for adjacency matrices 
can in turn be deduced either from matrix-valued deviation inequalities (as 
in [37]) or from bounds for norms of random matrices (as in [25]). 

Eor sparse random graphs, with bounded expected degrees, neither the 
adjacency matrix nor the Laplacian concentrate, due to the high variance 
of the degree distribution ([3, 21, 18]). High degree vertices make the adja¬ 
cency matrix unstable, and low degree vertices make the Laplacian unstable. 
Indeed, a random graph in Erdos-Renyi model G(n, p) has isolated vertices 
with high probability if the expected degree d = np is o(logn). In this 
case, the Laplacian £{A) has multiple zero eigenvalues, while £{A) has a 
single eigenvalue at zero and all other eigenvalues at 1. This implies that 
jj£(H) — £(A)11 > 1, so the Laplacian fails to concentrate. Moreover, there 
are vertices with degrees ^ d with high probability, which force jjHjj ^ d 
while jjHjj = d, so the adjacency matrix does not concentrate either. 

1.3. Regularization of sparse graphs. If the concentration of sparse ran¬ 
dom graphs fails because of the degree distribution is too irregular, we may 
naturally ask if regularizing the graph in some way solves the problem. If 
such a regularization is to work, it has to enforce spectrum stability and 
concentration of the Laplacian, but also preserve the graph’s geometry. 
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One simple way to deal with isolated and very low degree nodes, proposed 
by [6] and analyzed by [27], is to add the same small number r > 0 to all 
entries of the adjacency matrix A. That is, we replace A with 

Ar:=A + Tll'^ (1.3) 

where 1 denotes the vector in M"' whose components are all equal to 1, and 
then use the Laplacian of Aj- in all subsequent analysis. This regularization 
creates weak edges (with weight r) between all previously disconnected ver¬ 
tices, thus increasing all node degrees by nr. Another way to deal with low 
degree nodes, proposed by [14] and studied theoretically by [39], is to add a 
constant nr directly to the diagonal of D in the definition (1.1). 

Our paper answers the question of whether the regularization (1.3) leads 
to concentration in the sparse case, by which we mean the case when all 
node degrees are bounded. Note that for both regularizations described 
above, the concentration holds trivially if we allow r to be arbitrarily large. 
The concentration was obtained if rer grows at least as fast as logn in 
[39] and [27]. However, when all expected node degrees are bounded, this 
requirement will lead to rll"'' dominating A. To apply the concentration 
results obtained in [39, 27] to community detection, one needs the average 
of expected node degrees to grow at least as logn, although the minimum 
expected degree can stay bounded. This is an unavoidable consequence of 
using Oliveira’s result [37], which gives a logn factor in the bound, and 
makes the extension of these bounds to our case of all bounded degrees 
difficult. 

To the best of our knowledge, up to this point it has been unknown 
whether any regularization creates informative concentration for the adja¬ 
cency matrix or the graph Laplacian in the sparse case. However, a different 
Laplacian based on non-backtracking random walks was proposed in [29] and 
analyzed theoretically in [11]; this can be thought of as an alternative and 
more complicated form of regularization, since introducing non-backtracking 
random walks also avoids isolated nodes and very low degree vertices such 
as those attached to the core of the graph by just one edge (which includes 
dangling trees). Other methods, which are related to the non-backtracking 
random walks, are the belief propagation algorithm [20, 19] and the spectral 
algorithm based on the Bethe Hessian matrix [41]. Although these meth¬ 
ods have been empirically shown to perform well in sparse case, there is no 
theoretical analysis available in that regime so far. 

1.4. Sparse graphs concentrate after regularization. We will prove 
that regularization (1.3) does enforce concentration of the Laplacian even 
for graphs with bounded expected degrees. To formally state our result for 
the inhomogeneous Erdos-Renyi model, we shall work with random matrices 
of the following form. 

Assumption 1.2. A is an n x n symmetric random matrix whose binary 
entries are jointly independent on and above the diagonal, with E A = {pij). 
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Let numbers d > e, do > 0 and a be such that 

n 

maxnpij < d, min > pij > do, 
j 


d 


Theorem 1.3 (Concentration of the regularized Laplacian). Let A he a 
random matrix satisfying Assumption 1.2 and denote Ar = "KAr- Then for 
any r >\, with probability at least 1 — n~^ we have 


\\C{A^) - C{A^)\\ < Cra^log^id) + 
Here C denotes an absolute constant. 



for any r > 0. 


We will give a slightly stronger result in Theorem 8.4. The exponents of 
r, a and of log d are certainly not optimal, and to keep the argument more 
transparent, we did not try to optimize them. We do not know if the logd 
term can be completely removed; however, in sparse graphs d and thus log d 
are of constant order anyway. 

Remark 1.4 (Concentration around the original Laplacian). It is important 
to ask whether regularization does not destroy the original model - in other 
words, whether C{At) is close to C{A). If we choose the regularization 
parameter r so that d nr 1, it is easy to check that ||£(74 t-) — C{A)\\ <C 
1, thus regularization almost does not affect the expected geometry of the 
graph. Together with Theorem 1.3 this implies that 

\\C{Ar)-C{A)\\ < 1. 

In other words, regularization forces the Laplacian to stay near C{A), and 
this would not happen without regularization. 

Remark 1.5 (Weighted graphs). Since our arguments will be based on prob¬ 
abilistic rather than graph-theoretic considerations, the assumption that A 
has binary entries is not at all crucial. With small modifications, it can be 
relaxed for matrices with entries that take values in the interval [0,1], and 
possibly for more general distributions of entries. We do not pursue such 
generalizations to make the arguments more transparent. 

Remark 1.6 (Directed graphs). Theorem 1.3 also holds for directed graphs 
(whose adjacency matrices are not symmetric and have all independent en¬ 
tries) for a suitably modified definition of the Laplacian (1.1), with the two 
appearances of D replaced by matrices of row and column degrees, respec¬ 
tively. In fact, our proof starts from directed graphs and then generalizes to 
undirected graphs. 

1.5. Concentration on the core. As we noted in Section 1.2, sparse ran¬ 
dom graphs fail to concentrate without regularization. We are going to show 
that this failure is caused by just a few vertices, n/d of them. On the rest 
of the vertices, which form what we call the core, both the adjacency ma¬ 
trix and the Laplacian concentrate even without regularization. The idea of 
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constructing a graph core with large spectral gap has been exploited before. 
Alon and Kahale [3] constructed a core for random 3-colorable graphs by 
removing vertices with large degrees; Feige and Ofek [21] constructed a core 
for G{n,p) in a similar way; Coja-Oghlan and Lanka [18] provided a differ¬ 
ent construction for a somewhat more general model (random graphs with 
given expected degrees), which in general cannot be used to model networks 
with communities. Alon and co-authors [2] used Grothendieck’s inequality 
and SDP duality to construct a core; they showed that the discrepancy of a 
graph, which measures how much it resembles a random graph with given 
expected degrees, is determined by the spectral gap of the restriction of the 
Laplacian on the core (and vise versa). 

The following result gives our construction of the core for the general 
inhomogeneous Erdos-Renyi model G{n, {pij))- As we will discuss further, 
our method of core construction is very different from the previous works. 

Theorem 1.7 (Concentration on the core). In the setting of Theorem 1.3, 
there exists a subset J of [n] which contains all but at most n/d vertices, 
and such that 

(1) the adjacency matrix concentrates on J x J: 

II(A - A)jxj|| < GrVdlog^d- 

(2) the Laplacian concentrates on J x J: 

We will prove this result in Theorems 5.7 and 7.2 below. Note that 
concentration of the Laplacian (part 2) follows easily from concentration of 
the adjacency matrix (part 1). This is because most vertices of the graph 
have degrees ~ d, so keeping only such vertices in the core we can relate 
the Laplacian to the adjacency matrix as C{A) ~ — ^A. This makes the 

deviation of the Laplacian in Theorem 1.7 about d times smaller than the 
deviation of the adjacency matrix. 

The rest of this paper is organized as follows. Section 2 outlines the 
steps we will take to prove the main Theorem 1.3. Section 3 discusses the 
application of this result to community detection in networks. The proof is 
broken up into the following sections: Section 4 states the Grothendieck’s 
results we will use and applies them to the first core block (which may not 
yet be as large as we need). Section 5 presents an expansion of the core to the 
required size and proves the adjacency matrix concentrates there. Section 6 
describes a decomposition of the residual of the graph (after extracting the 
expanded core) that will allow us to control its behavior. Sections 7 and 8 
prove the result for the Laplacian, showing, respectively, that it concentrates 
on the core and can be controlled on the residual, which completes the proof 
of the main theorem. The proof of the corollary for community detection is 
given in Section 9. 
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2. Outline of the argument 

Our approach to proving Theorem 1.3 consists of the following two steps. 

1. Remove the few (n/d) problematic vertices from the graph. On the 
rest of the graph - the core - the Laplacian concentrates even without 
regularization, by Theorem 1.7. 

2. Reattach the problematic vertices - the residual ~ back to the core, and 
show that regularization provides enough stability so that the concen¬ 
tration is not destroyed. 

We will address these two tasks separately. 


2.1. Construction of the core. We start with the first step and outline 
the proof of the adjacency part of Theorem 1.7 (the Laplacian part follows 
easily, as already noted). Our construction of the core is based on the fol¬ 
lowing theorem which combines two results due to Grothendieck, his famous 
inequality and a factorization theorem. This result states that the opera¬ 
tor norm of a matrix can be bounded by the ioo norm on a large 

sub-matrix. This norm is defined for an m x k matrix B as 


||R||oo^i = max x^By. (2.1) 

This norm is equivalent to the cut norm, which is more frequently used in 
theoretical computer science community (see [22, 4, 28]). 


Theorem 2.1 (Grothendieck). For every mxk matrix B and for any 6 > 0, 
there exists a sub-matrix Bj^j with jlj > (1 — 6)m and |J| > (1 — d)k and 
such that 


\Bixj\\ < 


2\\B\ 


00^1 


SV mk 


We will deduce and discuss this theorem in Section 4.1. The £oo £\ 
norm is simpler to deal with than the operator norm, since the maximum of 
the quadratic form in (2.1) is taken with respect to vectors x, y whose coor¬ 
dinates are all ±1. This can be helpful when R is a random matrix. Indeed, 
for B = A — A, one can hrst use standard concentration inequalities (Bern¬ 
stein’s) to control x^ By for hxed x and y, and afterwards apply the union 
bound over the possible choices of x, y. This simple argument shows 

that, while concentration fails in the operator norm, adjacency matrices of 
sparse graphs concentrate in the ioo —>• £\ norm: 


A jLjjoo—>1 = 0{nVd) with high probability. (2-2) 


To see this is a concentration result, note that for large d the right hand side 
is much smaller than ||T||oo- 5 .i, which is of order nd. This fact was observed 
in [24], and we include the proof in this paper as Lemma 4.6. 

Next, applying Grothendieck’s Theorem 2.1 with m = k = n and 5 = 
1/20, we obtain a subset Ji which contains all but O.ln vertices, on which 
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the adjacency matrix concentrates: 

\\{A-A)j,^j,\\=OiVd). (2.3) 

Again, to understand this as concentration, note that for large d the right 
hand side is much smaller than ||Ajjx Ji ||, which is of order d. 

We obtained the concentration inequality claimed in the adjacency part 
of Theorem 1.7, but with a core that may not be as large as we claimed. 
Our next goal is to reduce the number of residual vertices from O.ln to 
n/d. To expand the core, we continue to apply the argument above to the 
remainder of the matrix, thus obtaining new core blocks. We repeat this 
process until the core becomes as large as required. At the end, all the core 
blocks constructed this way are combined using the triangle inequality, at 
the small cost of a factor polylogarithmic in d. 

2.2. Controlling the regularized Laplacian on the residual. The sec¬ 
ond step is to show that regularized Laplacian C{A-j-) is stable with respect 
to adding a few vertices. We will quickly deduce such stability from the 
following sparse decomposition of the adjacency matrix. 

Theorem 2.2 (Sparse decomposition). In the setting of Theorem 1.3, we 
can decompose any sub-matrix Ajxj with at most n/d rows or columns into 
two matrices with disjoint support, 

A = Ac -I Atz, 

in such a way that each row of Ati and each column of Aq will have at most 
lOrlogr entries that equal 1. 

We will obtain a slightly more informative version of this result as Theo¬ 
rem 6.4. The proof is not difficult. Indeed, using a standard concentration 
argument it is possible to show that there exists at least one sparse row 
or column of Ajxj- Then we can iterate the process - remove this row or 
column and find another one from the smaller sub-matrix, etc. The removed 
rows and columns form the A-ji and Ac, respectively. 

To use Theorem 2.2 for our purpose, it would be easier to drop the iden¬ 
tity from the definition of the Laplacian. Thus we consider the averaging 
operator 

L{A) := I -£{A) = (2.4) 

which is occasionally also called the Laplacian. We show that the regularized 
averaging operator is small (in the operator norm) on all sub-matrices with 
small dimensions. 

Theorem 2.3 (Residual). In the setting of Theorem 1.3, any sub-matrix 
L{I^t)ixJ with at most n/d rows or columns satisfies 

^ 2 ^/lOrTo^ 

~ Vd 



for any r > 0. 
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We will prove this result as Theorem 8.2 below. The proof is based 
on the sparse decomposition constructed in Theorem 2.2 and proceeds as 
follows. It is enough to bound the norm of L{Ar)'R,. By definition (2.4), L[A) 
normalizes each entry of A by the sums of entries in its row and column. It 
is not difficult to see that Laplacians must scale accordingly, namely 

||^(4l.)7^|| < Vi||^^r)/xj|| (2.5) 

if the sum of entries in each column of L{Ar)Ti is at most e times smaller 
than the corresponding sum for L{Ar). Let us assume that L{At)ixJ 
has n/d rows. The sum of entries of each column of L{At-)ti is at most 
nTjd + lOrlogd. (The first term here comes from adding the regularization 
parameter r to each of n/d entries of the column, and the second term comes 
from the sparsity of TZ.) The sum of entries of each column of L{Ar)ixj is 
at least nr due to regularization. Substituting this into (2.5), we obtain 

1 ,^/. N ,, h^'r/d + lOrlogd ^ . . 

\\L{ArM < Y ^ ^ \\L{Ar)ixj\\. 

Since the norm of L is always bounded by 1, this leads to the conclusion of 
Theorem 2.3. 

Finally, Theorem 1.3 follows by combining the core part (Theorem 1.7) 
with the residual part (Theorem 2.3). To do this, we decompose the part of 
the Laplacian outside the core JxJ into two residual matrices, one on J^x [n] 
and another on J x J^. We use that the regularized Laplacian concentrates 
on the core and is small on each of the residual matrices. Combining these 
bounds by triangle inequality, we obtain Theorem 1.3. 

3. Community detection in sparse networks 

3.1. Stochastic models of complex networks. Concentration results for 
random graphs have remarkable implications for network analysis, specifi¬ 
cally for understanding the behavior of spectral clustering applied in the 
community detection problem. Real world networks are often modelled as 
random graphs, and finding communities - groups of nodes that behave sim¬ 
ilarly to each other. Most of the models proposed for modeling communities 
to date are special cases of the inhomogeneous Erdos-Renyi model, which 
we discussed in Section 1.4. In particular, the stochastic block model [26] 
assigns one of K possible community (block) labels to each node i, which 
we will call Cj G {l,...,iF}, and then assumes that the probability of an 
edge pij = BciCj, where R is a symmetric K x K matrix containing the 
probabilities of edges within and between communities. 

For simplicity of presentation, we focus on the simplest version of the 
stochastic block model, also known as the balanced planted partition model, 
which assumes K = 2, Bn = R 22 = P, B 12 = q, and the two communities 
contain the same number of nodes (we assume that n is an even number and 
split the set of vertices into two equal parts Ci and C 2 ). We further assume 
that p > q, and thus on average there are more edges within communities 




10 


CAN M. LE, ELIZAVETA LEVINA, AND ROMAN VERSHYNIN 


than between them. (This is a called an assortative network model; the 
disassortative case p < q can in principle be treated similarly but we will 
not consider it here). We call this model of random graphs G{n,p, q). 

3.2. The community detection problem. The community detection prob¬ 
lem is to recover the node labels Cj, i = 1,... ,n from a single realization of 
the random graph model, in our case G{n,p,q), or in more common nota¬ 
tion, G{n, ^, ^)- A large literature exists on both the detection algorithms 
and the theoretical results establishing when detection is possible, with the 
latter mostly confined to the simplest G{n, model. A conjecture was 
made in the physics literature [19] and rigorous results established in a series 
of papers by Mossel, Neeman and Sly, as well as independently by two other 
groups - see [35, 36, 34, 1, 32]. It is now known that no method can do 
better than random guessing unless 

(a — 6)^ > 2(a -|- b). 

Further, weak consistency (fraction of mislabelled nodes going to 0 with 
high probability) is achievable if and only if (a — 6)^/ {a + b) —)> oo, and 
strong consistency, or exact recovery (labelling all nodes correctly with high 
probability) requires a stronger necessary and sufficient condition given by 
[34] in terms of certain binomial probabilities, which is satisfied when the 
average expected degree ^{a + b) is of order logn or larger, and a and b 
are sufficiently separated. Most existing results on community detection are 
obtained in the latter regime, showing exact recovery is possible when the 
degree grows faster than logn - see e.g., [33, 9]. 

There are very few existing results about community detection on sparse 
graphs with bounded average degrees. Consistency is no longer possible, 
but one can still hope to do better than random guessing above the de¬ 
tection threshold. A (quite complicated) adaptive spectral algorithm by 
Coja-Oghlan [17] achieves community detection if 

(a — b)'^ > G{a + b) log(a -|- b) 

for a sufficiently large constant G. Recently, two other spectral algorithms 
based on non-backtracking random walks were proposed by Mossel, Neeman 
and Sly [35] and Massouile [32] , which perform detection better than random 
guessing (fraction of misclassified vertices is bounded away from 0.5 as n —)• 
oo with high probability) as long as 

(a-bf > G{a + b) for G >2. (3.1) 

Finally, semi-definite programming approaches to community detection have 
been discussed and analyzed in the dense regime [15, 13, 7], and very recently 
Guedon and Vershynin [24] proved that they achieve community detection in 
the sparse regime under the same condition (3.1), also using Grothendieck’s 
results. 
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3.3. Regularized spectral clustering in the sparse regime. As an 

application of the new concentration results, we show that regularized spec¬ 
tral clustering [5] can be used for community detection under the G(n, 
model in the sparse regime. Strictly speaking, regularized spectral clustering 
is performed by first computing the leading K eigenvectors and then apply¬ 
ing the iiT-means clustering algorithm to estimate node labels, but since we 
are focusing on the case K = 2, we will simply show that the signs of the 
elements of the eigenvector corresponding to the second smallest eigenvalue 
(under the G(n, - , model the eigenvector corresponding to the smallest 

eigenvalue 0 does not contain information about the community structure) 
match the partition into communities with high probability. Passing from a 
concentration result on the Laplacian to a result about X-means clustering 
on its eigenvectors can be done by standard tools such as those used in [39] 
and is omitted here. 

Corollary 3.1 (Community detection in sparse graphs). Let e G (0,1) and 
r > 1. Let A be the adjacency matrix drawn from the stochastic block model 
G{n, ^, ^)- Assume that a > b, a > e, and 

(o — 6)^ > (a-b 6) log® a (3.2) 

for some large constant C > 0. Choose r = (di dnj/n?, where 

di, ...,dn are degrees of the vertices. Denote by v and v the unit-norm eigen¬ 
vectors associated to the second smallest eigenvalues of C{At) and C[At), 
respectively. Then with probability at least 1 — n~^, we have 

min jju -b BvWo < e. 

/ 3=±1 

In particular, the signs of the elements of v correctly estimate the partition 
into the two communities, up to at most en misclassified vertices. 

Let us briefly explain how Corollary 3.1 follows from the new concen¬ 
tration results. According to Theorem 1.3 and the standard perturbation 
results (Davis-Kahan theorem), the eigenvectors of C{Ar) approximate the 
corresponding eigenvectors of C{At-) and therefore of C{A). The latter ma¬ 
trix has rank two. The trivial eigenvector of C{Ar) is 1 , with all entries equal 
to 1. The first non-trivial eigenvector has entries 1 and —1, and it is con¬ 
stant on each of the two communities. Since we have a good approximation 
of that eigenvector, we can recover the communities. 

Remark 3.2 (Alternative regularization). A different natural regularization 
[14, 39] we briefly mentioned in Section 1.3 is to add a constant, say nr, to 
the diagonal of the degree matrix D in the definition of the Laplacian rather 
than to the adjacency matrix A. Thus we have the alternative regularized 
Laplacian I — Dr ' ADr ' , where Dr = D -\- nrl. One can think of this 
regularization as adding a few stars to the graph. Suppose for simplicity that 
nr is an integer. It is easy to check that this version of regularized Laplacian 
can also be obtained as follows: add nr new vertices, connect each of them 
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to all existing vertices, compute the (ordinary) Laplacian of the resulting 
graph, and restrict it to the original n vertices. It is straightforward to show 
a version of Theorem 1.3 holds for this regularization as well; we omit it 
here out of space considerations. 


4. GrOTHENDIECK’S THEOREM AND THE FIRST CORE BLOCK 

Our arguments will be easier to develop for non-symmetric adjacency 
matrices, which have all independent entries. One can think of them as 
adjacency matrices of directed random graphs. So most of our analysis will 
be concerning directed graphs, but in the end of some sections we will discuss 
undirected graphs. 

We are about to start proving the adjacency part of Theorem 1.7, first for 
directed graphs. Our final result will be a little stronger, see Theorems 5.6 
and 5.7 below, and it will hold under the following weaker assumptions on 
A. 

Assumption 4.1 (Directed graph, bounded expected average degree). A is 
an n X n random matrix with independent binary entries, and = {pij). 
Let number d > e be such that 

1 

- V Pij < d. 

* 4=1 

In other words, we shall consider a directed random graph whose expected 
average degree is bounded by d. 

In this section, we construct the first core block - one that misses O.ln 
vertices and on which the adjacency matrix is concentrated as we explained 
in Section 2.1. The construction will be based on two Grothendieck’s theo¬ 
rems. 


4.1. Grothendieck’s theorems. Grothendieck’s inequality is a fundamen¬ 
tal result, which was originally proved in [23] and formulated in [31] in the 
form we are going to use in this paper. Grothendieck’s inequality has found 
applications in many areas [2, 38, 28], and most recently in the analysis of 
networks [24]. 


Theorem 4.2 (Grothendieck’s inequality). Consider an m x k matrix of 
real numbers B = (Bij). Assume that for all numbers Si,ti £ {~1) 1}; one 
has 


E 

*4 


Bij Sjtj 


< 1 


Then, for any Hilbert space H and all vectors Xi,yi in H with norms at most 
1, one has 
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Here Kq is an absolute constant usually called Grothendieck’s constant. 
The best value of Kq is still unknown, and the best known bound is Kq < 
7r/(21n(l + \/2)) < 1.783. 

It will be useful to formulate Grothendieck’s inequality in terms of the 
^oo norm, which is defined as 


|.B||oo^i = niax IlHtlli = max Bt 

= max > BijSitj. 


(4.1) 


Grothendieck’s inequality then states that for any m x k matrix B, any 
Hilbert space H and all vectors Xi,yi in H with norms at most 1, one has 


^ ^ Vj) — -^g||-®||oo—^- l- 




Remark 4.3 (Gut norm). The i^o norm is equivalent to the cut norm, 
which is often used in theoretical computer science literature (see [4, 28]), 
and which is defined as the maximal sum of entries over all sub-matrices of 
B. The cut norm is obtained if we allow Si and tj in (4.1) to take values 
in {0,1} as opposed to {—1,1}. When A is the adjacency matrix of a 
random graph and A = KA, the cut-norm of A — A measures the degree of 
“randomness” of the graph, as it controls the fluctuation of the number of 
edges that run between any two subset of vertices. 


We combine Grothendieck’s inequality with another result of A. Grothendieck 
(see [23, 38]), which characterizes the matrices B for which Yli j 
is small for all vectors Xi,yi with norms at most 1. 


Theorem 4.4 (Grothendieck’s factorization). Consider an m x k matrix 
of real numbers B = (Bij). Assume that for any Hilbert space H and all 
vectors Xi,yi in H with norms at most 1, one has 


J2Bij{xi,yj) 

hj 


< 1 . 


Then there exist positive weights and that satisfy YllLi = 1 ^.nd 
h'j = 1 such that 

where = diag(//i) and D^i = diag(/i' ) denote the diagonal matrices with 
the weights on the diagonal. 


Combining Grothendieck’s inequality and factorization, we deduce a re¬ 
sult that allows one to control the usual (operator) norm by the ioo h 
norm on almost all of the matrix. We already mentioned this result as 
Theorem 2.1. Let us recall it again and give a proof. 
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Theorem 4.5 (Grothendieck). For every mxk matrix B and for any <5 > 0, 
there exists a sub-matrix Bj^j with |I| > (1 — 6)m and |J| > (1 — d)k and 
such that 


BixjW < 


2||-B||oo->i 

dy/mk 


Proof. Combining Theorems 4.2 and 4.4, we obtain positive weights fii and 
fj,'j which sum to 1 and satisfy 

\\D;;^/^BD;y^\\ < KG\\B\\^^^. (4.2) 

Let us choose the set I to contain the indices of the weights Hi that are 
bounded below by Since all weights sum to one, I contains at least 

(1 — 5)m indices as required. Similarly, we define J to contain the indices 
of the weights /ij that are bounded below by {5k)~^‘, this set also has the 
required cardinality. 

By construction, all (diagonal) entries of ' and D^, ' are positive 
and bounded above by V 5m and yfdk respectively. This implies that 

On the other hand, by (4.2) the left hand side of this inequality is bounded 
above by Kg\\B\\oo^i. This completes the proof, since Grothendieck’s con¬ 
stant Kg is bounded by 2. □ 


4.2. Concentration of adjacency matrices in £oo ^i norm. As we 

explained in Section 1.2, the adjacency matrices of sparse random graphs 
do not concentrate in the operator norm. Remarkably, concentration can be 
enforced by switching to the foo ^i norm. We stated an informal version 
of this result in (2.2); now we are ready for a formal statement. It has been 
proved in [24]; let restate and prove it here for the reader’s convenience. 

Lemma 4.6 (Goncentration of adjacency matrices in norm). Let 

A be a random matrix satisfying Assumption 4-1. Then for any r > 1 the 
following holds with probability at least 1 — 

||A — A||oo->-i < 5rny/d. 

Proof. By definition, 

n 

||A-A||oo^i= max {Aij - Aij)xiyj. (4.3) 

a:,?/e{-l,l}" 

For a fixed pair x,y, the terms Xij := {Aij — Aij)xiyj are independent 
random variables. So we can use Bernstein’s inequality (see Theorem 2.10 
in [12]) to control the sum j=i There are terms here, all of them 
are bounded in absolute value by one, and their average variance is at most 
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d/n. Therefore by Bernstein’s inequality, for any t > 0 we have 


E 


Xij > } < exp 


nHy2 \ 

d/n + 1/3 ) 


(4.4) 


It is easy to check that this is further bounded by if we choose t = 

5r\/d/n. Thus, taking a union bound over 4” choices of pairs x, y and using 
(4.3) and (4.4), we obtain that 


11^ — ^||oo->i < tn^ = 5rnVd (4-5) 

with probability at least 1 — 4"'• > 1 —The lemma is proved. □ 


Remark 4.7 (Concentration). To better understand Lemma 4.6 as a concen¬ 
tration result, note that ||74||oo->-i = red if d is the average expected degree 
of the graph (that is, d = ^ Yin j=iPij)- Then the conclusion of Lemma 4.6 
can then be stated as 

\\A - -4||oo^l < ll^lloo^l- 

For large d, this means that A concentrates near its mean in —)• norm. 


4.3. Construction of the first core block. We can now quickly deduce 
the existence of the first core block - the one on which the adjacency matrix 
concentrates in the operator norm, as we outlined in (2.3). 

To do this, we first apply Lemma 4.6, then use Grothendieck’s Theo¬ 
rem 4.5 for m = k = n and 6 = 1/20, and finally we intersect the subsets I 
and J. We conclude the following. 

Proposition 4.8 (First core block). Let A he a matrix satisfying the con¬ 
clusion of Concentration Lemma 4-6. There exist a subset Ji C [n] which 
contains all but at most O.lre indices, and such that 

\\{A-A)j,,,j,\\<CrVd. 

Remark 4.9 (Concentration). To better understand Lemma 4.6, one can 
check that ||^|| > d if d is the average expected degree of the graph (that is, 
^ ~ k Y7j=iPij)- Then the conclusion of Lemma 4.6 can then be stated as 

Cr 

\\{A-A)j,,,j,\\< — \\A\\. 


5. Expansion of the core, and concentration of the adjacency 

MATRIX 

Our next goal is to expand the core so it contains all but at most re/d 
(rather than O.lre) vertices. As we explained in Section 2.1, this will be done 
by repeatedly constructing core blocks (using Grothendieck’s theorems) in 
the parts of the matrix not yet in the core. This time we will require a slightly 
stronger upper bound on the average degrees than in Assumption 4.1. 
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Assumption 5.1 (Directed graphs, stronger bound on expected density). A 
is an n X n random matrix with independent binary entries, and E ^ = {pij). 
Let number d > e be such that 

maxnpij < d. 
i,j 

5.1. Concentration in i^o ^i norm on blocks. First, we will need to 
sharpen the concentration inequality of Lemma 4.6 and make it sensitive to 
the size of the blocks. 


Lemma 5.2 (Concentration of adjacency matrices in £oo ^i norm). Let 
A be a random matrix satisfying Assumption 5.1. Then for any r > 1 the 
following holds with probability at least 1 — Consider a bloel? I x J 

whose dimensions m x k satisfy min(m, k) > n/Ad. Then 

||(^ - ^)/x jIIoo^i < SOrV mkd. 


Proof. The proof is similar to that of Lemma 4.6, except we take a further 
union bound over the blocks / x J in the end. Let us fix I and J. Without 
loss of generality, we may assume that m < k. By definition, 

||(^ - A)/xj||oo^i = max {Aij - Aij)xiyj. (5.1) 

a;e{-l,l}-, ye{-l,l}'= J 

For fixed pair x, y, we use Bernstein’s inequality like in Lemma 4.6. Denoting 
Xij = {Aij — Aij)xiyj^ we obtain 



> tmk 


< exp 


/ mkt‘^/2 \ 

\ d/n +t/3J 


(5.2) 


Deviating at this point from the proof of Lemma 4.6, we would like this 
probability to be bounded by {en/k)~^^^ in order to make room for the later 
union bound over /, J. One can easily check that this happens if we choose 
t = 15r-^ d/mn log(en/A:); this is the place where we use the assumption 
m > n/Ad. Thus, taking a union bound over 2™' • 2^ choices of pairs x, y and 
using (5.1) and (5.2), we obtain that 


||(T — ^) 7 xj||oo^i < tkfn = IbrVmkd ■ 



(5.3) 


with probability at least 1 — • {en/k)~^‘^^. We continue by taking a 

union bound over all choices of I and J. Recalling our assumption that 
m < k, we obtain that (5.3) holds uniformly for all /, J, as in the statement 
of the lemma, with probability at least 


1 - 


n 

E 

m=nld k=m 


E 


^m+k (en^-^rk 


k J 


> 1 — n 


-2r 


^By block we mean a product set I x J with arbitrary index subsets I,J 5^ [n]. These 
subsets are not required to be intervals of successive integers. 
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Thus we proved a slightly stronger version of the lemma, since the extra 
term y^log (^) in (5.3) is always bounded by 2. □ 

As in Section 4.3, we can combine Lemma 5.2 with Grothendieck’s The¬ 
orem 4.5. We conclude the following expansion result. 

Lemma 5.3 (Weak expansion of core into a block). Let A be a matrix 
satisfying the conclusion of Concentration Lemma 5.2. Consider a block IxJ 
whose dimensions mxk satisfy min(m, k) > nj^d. Then for every 5 G (0,1) 
there exists a sub-block I' x J' of dimensions at least (1 — 6)m x (1 — 6)k 
and such that 

||(A-A)//xj'|l < Cr6-^Vd. 

5.2. Strong expansion of the core into a block. The core sub-block 
I' X J' constructed in Lemma 5.3 is still too small for our purposes. For 
m < k, we would like J' to miss the number of columns that is a small 
fraction in m (the smaller dimension!) rather than k. To achieve this, we 
can apply Lemma 5.3 repeatedly for the parts of the block not yet in the 
core, until we gain the required number of columns. Let us formally state 
and prove this result. 

Proposition 5.4 (Strong expansion into a block). Let A be a matrix sat¬ 
isfying the conclusion of Concentration Lemma 5.2. Then any block I x [n] 
with |I| =: m > n/Ad rows contains a sub-block I' x J' of dimensions at least 
(m — m/8) X (re — m/8) and such that 

||(A — A)//x j'll < CrVdlog^ d. 

Proof. Let 6 G (0,1) be a small parameter whose value we will chose later. 
The hrst application of Lemma 5.3 gives us a sub-block Ii x ,Ji which misses 
at most 5m rows and 5n columns of / x [re], and on which A concentrates 
nicely: 

\\{A-A)j,,,j,\\<Cr5-^Vd. 

If the number of missing columns is to big, i.e. 5n > m/8, we apply 
Lemma 5.3 again for the block consisting of the missing columns, that is 
for I X Jf. It has dimensions at least mx 6n. We obtain a sub-block I 2 x J 2 
which misses at most 5m rows and 5‘^n columns, and on which A nicely 
concentrates: 

\\{A-A)i,^j,\\<Cr5-^yrd. 

If the number of missing columns is still too big, i.e. 5‘^n > m/8, we continue 
this process for I x (Ji U J 2 )'^, otherwise we stop. Figure 1 illustrates this 
process. 

The process we just described terminates after a finite number of applica¬ 
tions of Lemma 5.3, which we denote by T. The termination criterion yields 
that 

log(8re/rre) ^ log(8(i) 


T < 


log(l/5) log(l/<5) 


(5.4) 
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+core • • ■ m 


Figure 1. Core expansion into a block. First we construct 
the leftmost core block Ii x Ji, then the next core block to 
the right I 2 x J 2 , etc. The number of remaining columns 
reduces exponentially. 


(The second inequality follows from the assumption that m > n/d.) As an 
outcome of this process, we obtain disjoint blocks It ^ Jt ^ I ^ [n] which 
satisfy 

\I\It\<dm and | J \ (Ji U • • • U Jt)| < m/S (5-5) 

for all t. The matrix A concentrates nicely on each of these blocks: 

\\{A-A)i,^j,\\<Cr5-^Vd. 

We are ready to choose the index sets I' and J' that would satisfy the 
required conclusion. We include in V all rows of I except those left out at 
each of the block extractions, and we include in J' all columns of each block. 
Formally, we define 

:= /i n • • • n It and := Ji U • • • U Jt- (5-6) 

By (5.5), these subsets are adequately large, namely 

|/\/'|<r5m and \J\J'\<m/8. (5.7) 

To check that A concentrates on I' x J', we can decompose this block into 
(parts of) the sub-blocks we extracted before, and use the bounds on their 
norms. Indeed, using (5.6) we obtain 

T T 

\\{A - A)//xj'|l < ^ ||(^ - ^)/'xJtll < ^ ||(^ - ^)/txJtl| 

t=l t=l 

< CTr6-^Vd. (5.8) 

It remains to choose the value of 6. We let 5 = cj log(8d) where c > 0 
is an absolute constant. Choosing c small enough to ensure that we have 
T5 < 1/8 according to (5.4). This implies that, due to (5.7), the size of 
the block I' x J' we constructed is indeed at least (m — m/8) x [n — m/8) 
as we claimed. Finally, using our choice of 5 and the bound (5.4) on T we 
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conclude from (5.8) that 

\\{A-A)i,^j,\\<CrV~d^ 
This is slightly better than we claimed. 


\og\M) 


clog [c~^ log(8d)] 


□ 


5.3. Concentration of the adjacency matrix on the core: final re¬ 
sult. Recall that our goal is to improve upon Proposition 4.8 by expanding 
the core set Ji until it contains all but n/d vertices. With the expansion 
tool given by Proposition 5.4, we are one step away from this goal. We are 
going to show that if the core is not yet as large as we want, we can still 
expand it a bit more. 

Lemma 5.5 (Expansion of the core that is not too large). Let A be a matrix 
satisfying the conclusion of Concentration Lemma 5.2. Consider a subset J 
of [n] which contains all but m > n/Ad indices. Then there exists a subset 
J' of [n] which contains all but at most m/2 indices, and such that 

ll(^ “ ^)j'x J'll < ll(^-^)jxj|| + Cr\fd\o^ d. (5.9) 

Proof. We can decompose the entire [n] x [n] into three disjoint blocks - the 
core block J x J and the two blocks x [n] and J x in which we would 
like to expand the core; see Figure 2 for illustration. 


J 


J2 


J 


h 


h 




lo 


Jo 


Figure 2. Expansion of the core. 

Applying Proposition 5.4 to the m x n block x [n], we obtain a sub¬ 
block Ii X Ji which contains all but at most m/8 of its rows and columns, 
and on which A nicely concentrates: 

\\{A - A)i^xj^\\ < CrVdlog^ d. (5.10) 

Doing the same for the (n — m) x m block J x (after transposing and 
extending to an m x n block), we obtain a sub-block I 2 x J 2 which again 
contains all but at most m/8 of its rows and columns, and on which A nicely 
concentrates: 

||(A - A)/2XJ2|| < Crv/dlog^d. (5.11) 

Let Jq denote the set of all rows in [re] except those m/8-|-m/8 rows missed 
in the construction of either of the two sub-blocks Ii x Ji or I 2 XJ 2 . Similarly, 
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we let Jo be the set of the columns. The decomposition of [n] x [n] considered 
in the beginning of the proof induces a decomposition of Iq x Jq into three 
blocks, which are sub-blocks of J x J, /i x Ji and /2 x J 2 . (This follows since 
we remove all missing rows and columns.) Therefore, by triangle inequality 
we have 

||(^ - ^)/oXJol| < ||(^ - ^)jxj|| + ||(^ - ^)/iXJi|| + ||(^ - 

Substituting (5.10) and (5.11), we conclude that A nicely concentrates on 
the block Iq x Jq - just as we desired in (5.9). Since Iq and Jo may be 
different sets, we finalize the argument by choosing J' = Jq H Jq. Then 
J' X J' is a sub-block of Iq x Jq, so the concentration inequality (5.9) now 
holds as promised. Moreover, since each of the sets Iq and Jq misses at most 
m/4 indices, J' misses at most m/2 indices as claimed. □ 

Lemma 5.5 allows us to keep expanding the core until it misses all but 
(4(i)“^n vertices. 

Theorem 5.6 (Concentration of adjacency matrix on core). Let A be a 
random matrix satisfying Assumption 5.1. Then for any r > 1 the following 
holds with probability at least 1 — 2n“^^. There exists a subset J of [n] whieh 
contains all but at most n/Ad indices, and such that 

||(A-T)jxj|| <Cry/dlog^d. 

Proof. Fix a realization of the random matrix A which satisfies the conclu¬ 
sions of Proposition 4.8 and Concentration Lemma 5.2. Then Proposition 4.8 
gives us the first subset Ji that misses at most O.ln indices, and such that 

\\{A-A)j,,,j,\\<CrVd. 

If the number of missing indices is smaller than nfAd, we stop. Otherwise 
we apply the Expansion Lemma 5.5. We obtain a subset J 2 which misses 
twice fewer indices than Ji, and for which 

||(^ - A)j 2 x J 2 II < CrVd + CrVdlog^ d. 

If the new number of missing indices is smaller than n/Ad, we stop. Other¬ 
wise we keep applying the Expansion Lemma 5.5. 

Each application of this lemma results in an additive term Cry/dlog^ d, 
and it also halves the number of missing indices. By the stopping criterion, 
the total number of applications is at most logd. Thus, after the process 
stops, the final set J satisfies 

||(^ — A)jx j|| < Cry/d + CrVdlog^ d ■ logd. 


This completes the proof. 


□ 
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5.4. Extending the result for undirected graphs. Theorem 5.6 can be 

readily extended for undirected graphs, where the adjacency matrix A is 
symmetric, with only entries on and above the diagonal that are indepen¬ 
dent. We claimed such result in the adjacency part of Theorem 1.7; let us 
restate and prove it. 

Theorem 5.7 (Concentration of adjacency matrix on core: undirected 
graphs). Let A be a random matrix satisfying the same requirements as in 
Assumption 5.1, except A is symmetric. Then for any r > 1 the following 
holds with probability at least 1 — 2n“^^. There exists a subset J of [n] which 
contains all but at most n/2(i indices, and such that 

\\{A- A)jy^j\\ ^CrVdlog^d. 

Proof. We decompose the matrix A = A'^ A~ so that each of A'^ and A~ 
has all independent entries. (Consider the parts of A above and below the 
diagonal.) It remains to apply Theorem 5.6 for A~^ and A~ and intersect the 
two subsets we obtain. The conclusion follows by triangle inequality. □ 


6. Decomposition of the residual 

In this section we show how to decompose the residual (in fact, any small 
matrix) into two parts, one with sparse rows and the other with sparse 
columns. This will lead to Theorem 2.2, which we will obtain in a slightly 
more informative form as Theorem 6.4 below. 

Again, we will work with directed graphs for most of the time, and in the 
end discuss undirected graphs. 


6.1. Square sub-matrices: selectiug a sparse row. First we show how 
to select just one sparse row from square sub-matrices. Then we extend this 
to rectangular matrices, and finally we iterate the process to construct the 
required decomposition. 

Lemma 6.1 (Selecting a sparse row). Let A he a random matrix satisfying 
Assumption 5.1. Then for any r > 1 the following holds with probability at 
least 1 — Every square sub-matrix of A with at most n/d rows has a 

row with at most lOrlogd entries that equal 1. 


Proof. The argument consists of a standard application of Chernoff’s in¬ 
equality and a union bound over the square sub-matrices Aj^j- 

Let us fix the dimensions mxm and the support Jx J of a sub-matrix Ajxj 
for a moment, and consider one of its rows. The number of entries that equal 
1 in i-th row Si = Yljei ^ ^ independent Bernoulli random 

variables Aij. Each Aij has expectation at most dfn hy the assumptions on 
A. Thus the expected number of ones in i-th row is at most 1, since 


E Si < - =: n 


n 


which is bounded by 1 by assumption on m. 
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To upgrade this to a high-probability statement, we can use Chernoff’s 
inequality. It implies that the probability that i-th row is too dense (denser 
than we are seeking in the lemma) is 


P {Si > lOr logd} < 


lOr logdx 






= : p. 


( 6 . 1 ) 


By independence, the probability that all m rows of Aj^j are too dense is at 
most p"^. Before we take the union bound over I x J, let us simplify the the 
probability p in (6.1). Since r > 1, we have lOr logd/(ep) > 3/p = 3n/(dm). 
Therefore 

log(l/p) > lOr log((i) log (6.2) 

\dm/ 

By assumption m < re/d on the number of rows, both logarithms in the 
right hand side of (6.2) are bounded below by 1. Then, using the elementary 
inequality 2ab > a+ b that is valid for all a, 6 > 1, we obtain 


log(l/p) > 5r 

Summarizing, we have shown that for a fixed support / x J, the probability 
that all rre rows of the sub-matrix A/xj are too dense is bounded by 


log d + log 


ore 

dm 


= 5r log f 


6n\ 
m, J 


\m J 

It remains to take a union bound over all supports I x J. 
the failure probability of the conclusion of lemma by 


This bounds 



njd 

E 

m=l 


ere\ 2m /3re\-5mr 

m / 


m J 


< n 


-2r 


This completes the proof. 


□ 


6.2. Rectangular sub-matrices, and iteration. Although we stated Lemma 6.1 
for square matrices, it can be easily adapted for rectangular matrices as well. 
Indeed, consider amxk sub-matrix of A. If the matrix is tall, that ism > k, 
then we can extend it to a square m x m sub-matrix by adding arbitrary 
columns from A. Applying Lemma 6.1, we obtain a sparse row of the bigger 
matrix - one with at most lOrlogd ones in it. Then trivially the same row 
of the original m x k sub-matrix will be sparse as well. 

The same reasoning can be repeated for fat sub-matrices, that is for m < 
k, this time by applying Lemma 6.1 to the transpose of A. This way we 
obtain a sparse column of a fat sub-matrix. Combining the two cases, we 
conclude the following result that is valid for all small sub-matrices. 


Lemma 6.2 (Selecting a sparse row or column). Let A be a random matrix 
satisfying Assumption 5.1. Then for any r > 1 the following holds with 
probability at least 1 — 2re“^'’. Every sub-matrix of A whose dimensions 
mxk satisfy min(rre, k) < n/d has a row (if m > k) or a column (if m < k) 
with at most 1 Or logd entries that equal 1. 
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Iterating this result - selecting rows and columns one by one - we are 
going to obtain a desired decomposition of the residual. Here we adopt the 
following convention. Given a subset TZ of [n] x [n], we denote by Ati the 
n X n matrix^ that has the same entries as H on C and zero outside IZ. 

Theorem 6.3 (Decomposition of the residual). Let A he a random matrix 
satisfying Assumption 5.1. Then for any r > 1 the following holds with 
probability 1 — Every index subset I x J of[n]x [n] whose dimensions 

mxk satisfy <njd can be decomposed into two disjoint subsets 

TZ and C with the following properties: 

(i) each row of TZ and each column of C have at most min(m, A:) entries^ 
(a) each row of the matrix A-ji and each column of the matrix Aq have at 
most lOrlogd entries that equal 1. 

Proof. Let us fix a realization of A for which the conclusion of Lemma 6.2 
holds. Suppose we would like to decompose an m x fe sub-matrix H/xj. 
According to Lemma 6.2, it has a sparse row or column. Remove this row 
or column, and apply Lemma 6.2 for the remaining sub-matrix. We obtain 
a sparse row or column of the smaller matrix. Remove it as well, and 
apply Lemma 6.2 for the remaining sub-matrix. Continue this process until 
we removed everything from A/xj. Then define TZ to be the union of all 
rows we removed throughout this process, and C the union of the removed 
columns. By construction, TZ and C satisfy part (ii) of the conclusion. 

Part (i) follows by analyzing the construction of TZ and C. Without loss 
of generality, let m < k. The construction starts by removing columns of 
A/xj (which obviously have m entries as required) until the aspect ratio 
reverses, i.e. there remain fewer columns than m. After that point, both 
dimensions of the remaining sub-matrix are again bounded by m, so part 
(i) follows. □ 

6.3. Extending the result for undirected graphs. Theorem 6.3 can 
be readily extended for undirected graphs. We stated such result as Theo¬ 
rem 2.3; let us restate it in a somewhat more informative form. 

Theorem 6.4 (Decomposition of the residual, undirected graphs). Let A 
be a random matrix satisfying the same requirements as in Assumption 5.1, 
except A is symmetric. Then for any r > 1 the following holds with prob¬ 
ability 1 — 2n“^^. Every index subset T x J of [n] x [n] whose dimensions 
mxk satisfy mm{m,k) <n/d can be decomposed into two disjoint subsets 
TZ and C with the following properties: 

(i) each row ofTZ and each column of C have at most 2mm{m,k) entries; 

(ii) each row of the matrix A-ji and each column of the matrix Aq have at 
most 20r log d entries that equal 1. 

^This does not exactly agree with our usage of Aj^j which denotes an |7| x \ J\ matrix, 
but this slight disagreement will not cause confusion. 

^Formally, for TZ this means that \{j : (i,j) G TZ]\ < min(m, fe) for each i G [n], and 
similarly for C. 
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Proof. We decompose the matrix A = A~^ + A~ so that each of A'^ and A~ 
has all independent entries. (Consider the parts of A above and below the 
diagonal.) It remains to apply Theorem 5.6 for A'^ and A~ and choose TZ to 
be the union of the disjoint sets TZ'^ and TZ~ we obtain this way; similarly 
for C. The conclusion follows trivially. □ 

7. Concentration of the Laplacian on the core 

In this section we translate the concentration result on the core, Theo¬ 
rems 5.6, from adjacency matrices to Laplacian matrices. This will lead to 
the second part of Theorem 1.7. 

From now on, we will focus on undirected graphs, where A is a symmetric 
matrix. Throughout this section, it will be more convenient to work with 
the alternative Laplacian defined in (2.4) as 

L{A) = 1- C{A) = D-^/‘^AD-^/‘^. 

Clearly, the concentration results are the same for both definitions of Lapla¬ 
cian, since L{A) — L{A) = C{A) — C{A) (and similarly for Ar). 

7.1. Concentration of degrees. We will easily deduce concentration of 
L{A) on the core from concentration of adjacency matrix A (which we al¬ 
ready proved in Theorems 5.6) and the degree matrix D = diag(dj). The 
following lemma establishes concentration of D on the core. 

Lemma 7.1 (Concentration of degrees on core). Let A be a random matrix 
satisfying the same requirements as in Assumption 5.1, except A is symmet¬ 
ric. Then for any r > 1, the following holds with probability at least 1 — 

There exists a subset J of [n] which contains all but at most n/2d indices, 
and such that the degrees dj = ^ij satisfy 

\dj —Kdj\ < 30r\/dlogd for all j G J. 

Proof. Let us fix j G [n] for a moment. We decompose A into an upper 
triangular and a lower triangular matrix, each of which has independent 
entries. This induces the decomposition of the degrees 

n n j—1 

dj = Aij = Aij Aij =: dj d'^. 

i=l i=j i=l 

By triangle inequality, it is enough to show that dJ and dJ concentrate near 
their own expected values. Without loss of generality, let us do this for d'j'. 

By construction, d'j is a sum of n independent Bernoulli random variables 
(including n — j 1 zeros) whose variances are all bounded by d/n by 
assumption on A. Thus Bernstein’s inequality (see Theorem 2.10 in [12]) 
yields 

p{M+-E<!+|>„t}<exp(-^^^=^), 


t > 0. 
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Choosing t = 15{r/n)\/d log d and simplifying the probability bound, we 
obtain 

P||d+ -Ed+I > Ibry/dlogd^ < 

We choose J~^ to consist of the indices for which |d^—Ed^l < 15r\/d log d. 
To control the size of the complement we may view it as a sum of 

n independent Bernoulli random variables, each with expectation at most 
]g |(J+)'=| < nd~^^^, and Chernoff’s inequality implies that 

Simplifying, we see that this probability is bounded by 2n~^^. 

Repeating the argument for d~, we obtain a similar set J~. Choosing J 
to be the intersection of J'*' and J~ and combining the two concentration 
bounds by triangle inequality, we complete the proof. □ 

7.2. Concentration of Laplacian on core. We are ready to prove the 
second part of Theorem 1.7, which we restate as follows. 

Theorem 7.2 (Concentration of Laplacian on core). Let A be a matrix 
satisfying Assumption 1.2. Then for any r > 1, the following holds with 
probability at least 1 — There exists a subset J of [n] which contains 

all but at most n/d indices, and such that 

II ^Cra^log^d 

||(L(^)-L(^))jxj|| < -(7.1) 

Proof. We need to compare the Laplacians 

L{A) = and L{A) = 

on a big core block J x J, where D = diag((ii) contains the actual degrees 
di, and D = diag{di) the expected degrees di = Ed*. 

We get the core set J by intersecting the two corresponding sets on which 
A concentrates (from Theorem 5.7) and the degree matrix D concentrates 
(from Lemma 7.1). To keep the notation simple, let us write the Laplacians 
on the core as 

L{A)j,^j = BRB and L{A)j,^j = BRB, 

where obviously R = Ajxj, R = Ajxj, B = Djl/j and B = Dj^j. Then 
we can express the difference of the Laplacians as a telescoping sum 

{L{A) - L{A))jy,j = B{R - R)B + BR{B - B) + {B - B)RB. (7.2) 

We will estimate each of the three terms separately. 

By the conclusion of Theorem 5.6, we have 

IIR — R|| < Cr-v/dlog^ d. 


(7.3) 
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Moreover, since all entries of A are bounded by d/n by assumption, we have 
II^11 < d, and in particular the sub-matrix R must also satisfy 

Pll < d. (7.4) 


— — 1/2 

Next we compare B and B, which are diagonal matrices with entries dj ' 

and dj on the diagonal, respectively. Since dj > do by assumption, we 
have 


B\\ < 


1 

y/do 


(7.5) 


Moreover, by the conclusion of Lemma 7.1, the degrees satisfy 


\dj — dj\ < 30r^/dlogd for all j G J. (7.6) 


We can assume that the right hand side here is bounded by do/2] otherwise 
the right hand side in the desired bound (7.1) is greater than two, which 
makes the bound trivially true. Therefore, in particular, (7.6) implies 

dj>dj - do/2 > do/2. (7.7) 

The difference between the corresponding entries of B and B is 

I .-1/2 _ t.-i/2| ^ \dj-dj\ 

' ' {dy^ + dd^^){djdj)y^' 

Since dj > do by definition, dj > do/2 by (7.7), and \dj — dj\ is small by 
(7.6), this expression is bounded by 


30ry/d logd 30r^/a logd 

d^^ do 

This and (7.7) implies that 


\B - B\\ < 


30ry/a logd 
do 


and ||i?|| < 


y/do 


(7.8) 


It remains to substitute into (7.2) the bounds (7.3) for R — R, (7.4) for 
R, (7.8) for B — B and B, and (7.5) for B. Using triangle inequality and 
recalling that d = ado, we obtain (7.1) and complete the proof. □ 


Remark 7.3 (Regularized Laplacian). We just showed that the Laplacian 
concentrates on the core even without regularization. It is also true with 
regularization. Indeed, Theorem 7.2 holds for the regularized Laplacian 
L{Ar) = I — C{Ar), and they state that 

||(L(^.r) - L{Ar))jxj\\ < for any r > 0. (7.9) 

This is true because Theorem 7.2 is based on concentration of the adjacency 
matrix A and the degree matrix D on the core. Both of these results triv¬ 
ially hold with regularization as well as without it, as the regularization r 
parameter cancels out, e.g. A^- — Ar = A — A. We leave details to the 
interested reader. 
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8. Control of the Laplacian on the residual, and proof of 

Theorem 1.3 

8.1. Laplacian is small on the residual. Now we demonstrate how reg¬ 
ularization makes Laplacian more stable. We express this as the fact that 
small sub-matrices of the regularized Laplacian L{Ar) have small norms. 
This fact can be easily deduced from the sparse decomposition of such ma¬ 
trices that we constructed in Theorem 6.3 and the following elementary 
observation. 


Lemma 8.1 (Restriction of Laplacian). Let B he annxn symmetric matrix 
with non-negative entries, and let C be a subset of [n] x [n]. Consider the 
n X n matrix Bq that has the same entries as B on C and zero outside C. 
Let e G (0,1). Suppose the sum of entries of each row of Be is at most e 
times the sum of entries of the corresponding row of B. Then 

\\{L{B))c\\ < 


Proof. Let us denote by L{Bc) an analog of the Laplacian for possibly non- 
symmetric matrix Be, that is 


L{Be) = Df^/^BeDf^/\ 


Here Dr = diag(i?cl) is a diagonal matrix and each diagonal entry {Dr)i,i 
of Dr is the sum of entries of z-th row of Be; Dc = diag(i?^l) is a diagonal 
matrix and {Dc)i^i is the sum of entries of z-th column of Be- Note that we 
can write L{B)e as 

L{B)e = D-^/^BeD-^/\ 

where D = diag(Rl) = diag(H^l). We have {Dr)i,i < eDi^i by the assump¬ 
tion and {Dc)i^i < Hz,* because C is a subset of [n] x [rz]. Since entries of 
both L{Be) and {L{B))e are non-negative, we obtain 

ll(i(B))cll < Vi\\L{Bc)l 


It remains to prove ||L(Rc)|| ^ 1- To see this, consider an 2n x 2n symmetric 
matrix 


S 


On Be \ 
B^ On j ’ 


where On is an zz x n matrix whose entries are zero. The Laplacian of S has 
the form 


L{S) = 


L{Bef 


UBe) \ 

On ) 


Since L{S) has norm one, it follows that ||L(i?c)|| < 1- This completes the 
proof. □ 


Theorem 8.2 (Regularized Laplacian on residual). Let A he a random 
matrix satisfying the same requirements as in Assumption 5.1, except A is 
symmetric. Then for any r > 1 the following holds with probability 1—2rz“^^. 
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Any sub-matrix L{Ar)ixj of the regularized Laplacian L{Ar) with at most 
n/d rows or columns satisfies 


\\L{Ar)lxJ 


2 

Vd 


for any r > 0. 


Proof. The decomposition I x J = TZU C we constructed in Theorem 6.4 
reduces the problem to bounding L{AT-)n and L{Afi)c. Let us focus on 
L{Ar)'jz- Recall that every row of the index set TZ has at most n/d entries, 
and every row of the matrix A'ji has at most lOr log d entries that equal one 
(while all other entries are zero). This implies that the sum of entries of 
each row of (At)'R, is bounded by 

77-T 

— + lOr log d. 
d 

We compare this to the sum of the entries of each row of A^, which is trivially 
at least nr. It is worthwhile to note that this is the only place in the entire 
argument where regularization is crucially used. Applying the Restriction 
Lemma 8.1, we obtain 


I|l(At-)7^|| < 



lOr log d 
nr 


Repeating the same reasoning for columns, we obtain the same bound for 
L{Ar)c- Using triangle inequality and simplifying the expression, we con¬ 
clude the desired bound for L{Ar)ixj- D 


Let us notice a similar, and much simpler, bound for the Laplacian of the 
regularized expected matrix Aj- = E 


Lemma 8.3 (Regularized Laplacian of the expected matrix on the residual). 
Let A be a random matrix satisfying the same requirements as in Assump¬ 
tion 5.1, except A is symmetric. Then any sub-matrix L{At-)ixJ with at 
most n/d rows or columns satisfies 


||L(A^)/xj|| < 



for any r > 0. 


Proof. Assume that L{Ar)ixj has at most n/d rows. Recall that the matrix 

At has entries npij -\- r. Then the sum of entries of each column of the 

sub-matrix {Afijixj is at most 

n nr 

- • max[pij +t) < — + 1. 
d j a 

We compare this to the sum of entries of each column of A^, which is at 
least nr. Applying the Restriction Lemma 8.1, we obtain 


This leads to the desired conclusion. 


□ 
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8.2. Concentration of the regularized Laplacian. We are ready to de¬ 
duce the main Theorem 1.3 in a slightly stronger form. 


Theorem 8.4 (Concentration of the regularized Laplacian). Let A be a ran¬ 
dom matrix satisfying Assumption 1.2. Then for any r > 1, with probability 
at least 1 — n~'' we have 


\\L{A^) - L{Ar)\\ < 


Cra^ log^ d 20^yr logd 

Vd 


for any r > 0. 


Proof. The proof is a combination of the Concentration Theorem 7.2 and the 
Restriction Theorem 8.2. Fix a realization of A for which the conclusions 
of both of these results hold. Theorem 7.2 yields the existence of a core 
set J that contains all but at most n/d indices from [n], and on which the 
regularized Laplacian concentrates: 

mAr)-L{AWjyA<^^Al^l, ( 8 , 1 ) 

(Here we used the version (7.9) that is valid for the regularized Laplacian.) 
Next, let us decompose the residual [n] x [n] \ J x J into two blocks 
X [n] and J x J‘^. The hrst block has at most n/d rows, so the conclusion 
of Restriction Theorem 8.2 applies to it. It follows that 


Jcx[n] 


2 y/A^r logd 


An even simpler bound holds for the expected version L(At-)jcx[„] accord¬ 
ing to Lemma 8.3. Summing these two bounds by triangle inequality, we 
conclude that that 


||(L(yl,) - L(.4,))j.,[„||| < T + 

In a similar way we obtain the same bound for the restriction onto the second 
residual block, J x J^. Combining these two bounds with (8.1), we complete 
the proof by triangle inequality. □ 


9. Proof of Corollary 3.1 (community detection) 


Proof of Corollary 3.1. Note that nr is the average node degree with expec¬ 
tation (a -|- b)/2. Using Bernstein’s inequality (see Theorem 2.10 in [12]), it 
is easy to check that with probability at least 1 — we have 


a + b 
nr — 


16r a-\-b 

< . - 

yja + b 2 


(9.1) 


It follows from assumption (3.2) (and increasing the constant C if necessary) 
that 16r/\/a -|- b <1/2. Therefore (9.1) implies 


a b 


< 


a -\-b 


nr — 


2 


4 


(9.2) 
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Let us fix a realization of the random matrix A which satisfies (9.2) and the 
conclusion of Theorem 1.3. For the model G(n, -) we have d = a and 
a < 2. From Theorem 1.3 and (9.2) we obtain 

\\L{Ar) - L{Ar)\\ < C'rlog^{a) (^ + (9.3) 

VriT J 

2>C'r log^ a 

< -^- =: 0, 

V« 

for some absolute constant C" > 0. 

We will use Davis-Kahan Theorem (see Theorem VII.3.2 in [8]) and (9.3) 
to bound the difference between v and v. Matrix L[Ar) has two non-zero 
eigenvalues: Ai = I and X 2 = {a — h)/{a + h + nr). By (9.2) we have 


4(a — b) 4(a — h) 4 

7(a + h) ~ ^ “ 5(a -|- 6 ) “ 5 


(9.4) 


To upper-bound the gaps in the spectra of L{At) and L{Ar), let us denote 


S = {\ 2 - 6 , 4/5 + 5), S' = i-5, 6 ) U { 1 - 6,1 + 6 ). 


Then X 2 G S because A 2 < 4/5 by (9.4); the remaining eigenvalues of 
L{Ar), which are either zero or one, are in S'. Inequality (9.3) implies that 
eigenvalues of L{Ar) are at most 6 away from the corresponding eigenvalues 
of L{Ar). Therefore the second largest eigenvalue of L{Ar) is in S and the 
remaining eigenvalues of L{Ar) are in S'. 

Note that S and S' are disjoint because 6 is small compared to A 2 . In 
fact, from the definition of 6 , assumption (3.2) (increasing the constant C if 
necessary), and (9.4) we have 


3C" e{a -h) 3C"e a-h 6 X 2 , . 

0 < —^ • — , = < —^ • -r < (9.5) 

\Ai ^/C{aA-^ y/C a + b 20 

Using (9.4) and (9.5), we bound the distance dist(5, 5') between S and S' 
as follows: 

dist(5, s') = min | a 2 — 26, - — 2 ( 5 | > — 26 > (9-6) 


Applying Theorem VII.3.2 in [ 8 ] and using (9.3), (9.6), (9.5) we obtain 


T --T 11 ^ ("^Z^) ||-^(^r) -^(^t)|| {Tr/2)6 ne 

dist(S', 5') A 2/8 5 


(9.7) 


It is easy to check that 


min 

/3=±1 


V + /3v\\2 < V 2 \\vv'^ 


--T 
— VV 


(9.8) 


Therefore from (9.7) and (9.8) we have min^=±i ||u -|- f 3 v \\2 < e. The proof 
is complete. □ 
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